Performance of a Natural Language Processing (NLP) Tool to Extract Pulmonary Function Test (PFT) Reports from Structured and Semistructured Veteran Affairs (VA) Data
نویسندگان
چکیده
INTRODUCTION/OBJECTIVE Pulmonary function tests (PFTs) are objective estimates of lung function, but are not reliably stored within the Veteran Health Affairs data systems as structured data. The aim of this study was to validate the natural language processing (NLP) tool we developed-which extracts spirometric values and responses to bronchodilator administration-against expert review, and to estimate the number of additional spirometric tests identified beyond the structured data. METHODS All patients at seven Veteran Affairs Medical Centers with a diagnostic code for asthma Jan 1, 2006-Dec 31, 2012 were included. Evidence of spirometry with a bronchodilator challenge (BDC) was extracted from structured data as well as clinical documents. NLP's performance was compared against a human reference standard using a random sample of 1,001 documents. RESULTS In the validation set NLP demonstrated a precision of 98.9 percent (95 percent confidence intervals (CI): 93.9 percent, 99.7 percent), recall of 97.8 percent (95 percent CI: 92.2 percent, 99.7 percent), and an F-measure of 98.3 percent for the forced vital capacity pre- and post pairs and precision of 100 percent (95 percent CI: 96.6 percent, 100 percent), recall of 100 percent (95 percent CI: 96.6 percent, 100 percent), and an F-measure of 100 percent for the forced expiratory volume in one second pre- and post pairs for bronchodilator administration. Application of the NLP increased the proportion identified with complete bronchodilator challenge by 25 percent. DISCUSSION/CONCLUSION This technology can improve identification of PFTs for epidemiologic research. Caution must be taken in assuming that a single domain of clinical data can completely capture the scope of a disease, treatment, or clinical test.
منابع مشابه
A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملGeneral Symptom Extraction from VA Electronic Medical Notes
There is need for cataloging signs and symptoms, but not all are documented in structured data. The text from clinical records are an additional source of signs and symptoms. We describe a Natural Language Processing (NLP) technique to identify symptoms from text. Using a human-annotated reference corpus from VA electronic medical notes we trained and tested an NLP pipeline to identify and cate...
متن کاملExtracting Imaging Observation Entities in Mammography Reports
Since radiology reports are created as unstructured text reports, Natural language processing (NLP) techniques are needed to extract structured information from reports to provide the inputs to information systems. The goal of this project is to develop NLP methods to extract the Imaging Observations and their modifiers from free-text mammography reports in order to provide structured data to r...
متن کاملA Clinical Decision Support System for Monitoring Post-Colonoscopy Patient Follow-Up and Scheduling
This paper describes a natural language processing (NLP)-based clinical decision support (CDS) system that is geared towards colon cancer care coordinators as the end users. The system is implemented using a metadata- driven Structured Query Language (SQL) function (discriminant function). For our pilot study, we have developed a training corpus consisting of 2,085 pathology reports from the VA...
متن کاملLarge Scale Clinical Text Processing and Process Optimization
and Objective This tutorial outlines the benefits and challenges of processing large volumes of clinical text with natural language processing (NLP). As NLP becomes more available and is able to tackle more complex problems, the ability to scale to millions of clinical notes must be considered. The Department of Veterans Affairs (VA) has more than 2 billion clinical notes has developed NLP libr...
متن کامل